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1. INTRODUCTION 

Hypertention is a degenerative disease but it is a serious problem in the world. Approximately, there 
are 65 million patients that are diagnosed hypertention in ratio 1:3 adults and it is 28% American’s 
prehypertention [1]. However, it is only 31% that the targeted blood pressure is achieved which the systolic 
blood pressure is less than 140 mmHg and or the diastolic blood pressure is less than 90 mmHg. In Indonesia, 
the prevalence of hypertention reaches 31.7%. Based on the previous research, it was found that 65.8% of 
patients in Harapan Kita polyclinic were detected having hypertention and only 39.3% reached the blood 
pressure target [2]. The 66% of hypertensive patients consume medicine on regular, and 60.7% of those have 
not reached the targeted blood pressure. As the previous research at the heart polyclinic of RSU dr. Saiful 
Anwar Malang shows that it is only 20.3% of hypertensive patients achieve the target of blood pressure [3]. 
These results show a high rate of uncontrolled hypertention. 

Uncontrolled hypertention is a risk factor for cardiovascular events which is caused of the coronary 
heart disease and the cerebrovascular disease [4]. Based on the World Health Organization (WHO), 
uncontrolled hypertention is effect to 7 million deaths in each productive age and 64 million disabilities [5]. 
Therefore, the efforts to decrease blood pressure is addressed to achieve the targets by adequate therapy. 
It is a very important to reduce mortality and morbidity of the related hypertention diseases. The most causes 
of hypertention are multi-factorial, including the related activity of the renin angiotensin system, increasing 
sympathetic system, obesity, stress, excessive salt consumption and genetics. Angiotensin receptor blockers 
(ARBs) are powerful vasilidators for inhibitor of Renin Angiotensin Aldosterone System (RAAS) as well as 
Angiotensin Converting Enzyme inhibitors (ACE1). Therapy using ACEi and ARB has proven many clinical 
benefits and it is widely used in clinical practice [6]. Uncontrolled blood pressure may be related to 
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adherence, choice of drug combinations, or genetic variant molecules which are involved in RAAS. Research 
in Japan showed that the renin C-5312T variant, which nucleotide substitution from C to T in the nucleotide 
sequence-5312, was an independent predictor of resistance for ARB users in Japan [7]. 

The other hand, many researchers have developed tools for diagnosis some diseases using 
bioinformatics approach including machine learning algorithm. The previous research has been conducted on 
pairwise DNA sequence alignment between hepatitis B virus (HBV) and hepatocarcinoma (HCC) using 
modified dynamic programming which improved performance of computation space and time [8]. 
Also, the related research have been conducted for classification of breast cancer using logistic regression, 
diabetic analylitics using data mining approach. Beside that, classification of brain tumor image segmentation 
is implemented using hybrid strategy for clustering and segmentation method [9-11]. Furthermore, Support 
vector Machine (SVM) is one of machine learning algorithms which has high accuracy in medical research. 
This algorithm is applied to detect hypertention based on radial pulse wave and some risk factors obesity, 
stress, systolic and diastolic blood pressure, physical exercises, cigaret consumption and diet lifestyle 
[12-13]. However, the involved data of this research is incomplete. Several features have null values. 
Therefore, this study is purposed the K-NN Imputation and SVM algorithm to implement a prediction system 
based on the characteristics of genetic variation in hypertensive patients against drug therapy. The system is 
developed using patient data that treats hypertention drugs including polymorphime from the angiotensin and 
renin genes which have an important role in the cardiovascular system. The first step is preprocessing data to 
solve the missing value using KNN-Imputation, then it will be constructed the model in SVM method in 
order to predict the drug therapy respond for hypertensive patiens. 


2. HYPERTENTION PATHOLOGY 

Hypertention is a complex pathophysiological disease. Systemic blood pressure regulation is 
multifactorial. It is basically the end result of cardiac autoregulation and peripheral vascular resistance 
(Figure 2.1) [14]. The renin system angiotensin aldosterone plays an important role in the regulation of blood 
pressure, electrolyte balance, and the pathogenesis of atherosclerosis [15-17]. In the initial phase is increasing 
RAAS activity which occurs by increasing production of angiotension and or expression or activity of Renin 
(REN). The Renin catalyzes the breakdown of angiotensinogen (AGT) into angiotensin I. Then, it is 
catalyzed to angiotensin II with angiotensin converting enzyme (ACE). Furthermore, Angiotensin II 
increases blood pressure through strong vasoconstriction and sodium retention [15-16]. As an illustration it 


can be shown in Figure 1. 
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Figure |. The renin system of angiotensin aldosterone (RAAS) in regulating blood pressure 
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Furthermore, based on the pathophysiological complexity of hypotension there are various anti- 
hypertensive therapies which aim to inhibit pathophysiology. There are many classes of antihypertensive 
drugs such as RAAS inhibitors consisting of ACEi and ARB, calcium channel blockers (CCB), diuretics, 


nervous system inhibitors sympathetic such as beta blockers, alpha blockers, and the details are shown in 
Table 1 [18]. 


Table 1. Anti-hypertention Drug and Dosage 








Type of Drug Name of Drug Dosage 
ACE inhibitors Captopril 12.5-50 mg twice daily 
Enlapril 5-40 mg once daily or in two equally divided doses 
Fosinopril 10-40 mg once daily 
Lisinopril 5-40 mg once daily 


Perindopril erbumine 
Perindopril arginine 


4-8 mg once daily 
5-10 mg once daily 


Quinapril 5-40 mg once daily or in two equally divided doses 
Ramipril 2.5-10 mg once daily or in two equally divided 
Trandolapril doses 
1-4 mg once daily 
Calcium channel blockers- Amlodipine 
dihydropyridine Felodipine 2.5-10 mg once daily 
Lercanidipine 5-20 mg once daily (controlled release) 
Nifedipine 10-20 mg once daily 
10-40 mg once daily (conventional) 
20-120 mg once daily (controlled release) 
Calcium channel blockers- Diltiazem 
nondihydropyridine Verapamil 180-360 mg once daily (controlled release) 
120-240 mg once daily (controlled release) 
Angiotensin II receptor Candesartan 
antagonists Eprosartan 8-16 mg once daily 
Irbesartan 600-800 mg once daily 
Losartan 150-300 mg once daily 
Telmisartan 50-100 mg once daily 
Olmesartan 20-80 mg once daily 
20-40 mg once daily 
Thiazide diuretics Chlorthalodone 
Hydrochlorothiazide 12.5-25 mg once daily 
Indapamide 12.5-25 mg once daily 
12.5-2.5 mg once daily 
Beta-blockers Bisoprolol 
Atenolol 1.25-10 mg once daily 
Carvedilol 25-100 mg once daily 
Labetalol 12.5-50 mg once dai 


Metoprolol tartrate 
Metoprolol succine 
(controlled release) 


100-400 mg twice dai 
50-100 mg twice dai 
12-190 mg daily 


SK 





Oxprenolol 
40-160 mg twice daily 
Other Clonidine 
Hydralazine 50-300 pg twice daily 
Methyldopa 12.5-100 mg twice daily 
Moxonidine 125-500 mg twice daily 
Prazosin 200-600 pg daily 


0.5-10 mg twice daily 





3. RESEARCH METHOD 


The system consists of two main stages, are preprocessing data using K-NN Imputation algorithm 


and prediction using SVM method as shown in Figure 2. 
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Figure 2. Flowchart of prediction system design for hypertention drug therapy response 


3.1. K-Nearest Neighbor (K-NN) Imputation 

K-Nearest Neighbor (KNN) Imputation is a method to get the attribute value based on the similarity 
between new cases and old cases at the appropriate features. According to Olivas [19], it is technique of 
Machine Learning to handle missing values data by imputation under considering the most similar 
record values. 

At the first stage, input data is taken from medical record of hypertensive patients. However, the 
number of ignorance data is required to fill values using this method which is based on the appropriate data 
similarity. By separately, the data is selected which has complete value and has incomplete values. The next 
process is calculating the eucledian distance of data value to be shorted the similarity. Finally, it is selected to 
the most frequency values. 


3.2. SUPPORT VECTOR MACHINE (SVM) 

The Support Vector Machine (SVM) method is a linear classification method by finding the best 
hyperplane that functions as a separator of two classes in input space. The basic concept of Support Vector 
Machine is linear classifiers, and then it is developed into non-linear classifiers by incorporating kernel tricks 
in high-dimensional space as in Figure 3. 

In Figure 3, it is illustrated the SVM method. The thick black line in the middle is a hyperplane that 
separates data +1 and data -1, in this study the data to be used is positive review data and negative review 
data. The closest point to the hyperplane is called Support Vector. The distance between a support vector and 
a hyperplane is called a margin. A support vector is a point that intersects a small black line. 

Basically, SVM method is a linear classifier that can only be used for linear data. Therefore, it is 
developed by adding a kernel trick in order to classify non-linear data. The classified data must be 
transformed to the vector space in high dimension. The kernel trick functions that can be used in non-linear 
SVM classifications are Polynomial, Gaussian (RBF) and Sigmoid. Each label is denoted yi € {-1, +1} for i 
= 1, 2, ...,n, where n is the amount of data. It is assumed that +1 and -1 classes can be completely separated 
from the hyperplane, which is defined: 


w.x+b=0 (1) 
Data xi is included into -1 and it is as stated in the (2). 
w.x,;+b<-1 (2) 


Data xi is included into +1 and it is as stated in the (3) 
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The largest margin is calculated by maximizing the distance between the nearest point and hyperplane. 


1, 


ini (4) 

In general, therefore the real problem is a non-linearly separable form, then two classes cannot be 
separated by a hyperplane completely. Therefore, SVM modification is needed by entering kernel functions. 
The non-linear SVM concept is to change the data x that is mapped by the function ® (x) to a higher 
dimensional vector space. This mapping aims to represent data in the new vector space. 

The learning process of SVM is finding support vectors by dot product data that has been 
transformed into the new space. The dot product value can be calculated without knowing the data 
transformation process ®. The kernel function provides convenience in the SVM learning process to 
determine support vectors in non-linear data [20]. The kernel function can be formulated as (5) 


K(x;.x)) = P(x). 0) (5) 


In the SVM method, Radial Basis Function (RBF) is a kernel to be applied in this research as in (6) 


K(X; -X;) = exp| — (8 ) (6) 


The next step is to make predictions by implementing the Sequential Support Vector Machine 
method including: calculation of the Hessian matrix, iteration to reach the maximum in least error rate or 
Max (| da |) <e. After that, the bias and similarities between the testing data and training data are calculated. 
As a result, it will be obtained the positive or negative classes as it shows in Figure 4. It is a flow diagram of 
the Support Vector Machine sequential process. 


4. RESULT AND DISCUSSION 

The data is taken from Syaiful Anwar Malang Hospital, at heart polyclinic. The feature details are as 
in Table 2 with balanced classes, which have the same total of data in each class. However, before testing 
with different data, validation tests are achieved with accuracy rate of 100%. This shows the system that is 
built is reliable. 


Table 2. The Feature of Data Set for Hypertensive Patient in Drug Therapy 








Feature Remark 
Code Identify of patient 
Gender Male/ Female 
Age Birth date 
Ethnic Javanese or others 
Waist C Waist circumference 
Hip C Hip Circumference 
Weight Weight of patient 
Height Height of patient 
Smoking Active/ passive 
Menopause Stop of menstruation 
Hypertention Historical Hypertention 
Ur Ureum 
Cr Creatine 
HDL High Density Lipoprotein 
LDL Low Density Lipoprotein 
TG Trigliserida 
Cholesterol Level of cholesterol total 
Glycemia Level of blood sugar 
AGT pre Level of Angiotensinogen at pre intervension (drug) 
AGT post Level of Angiotensinogen at post intervension (drug) 
DeltaAGT Difference between AGTPre and AGTPost 
Difference AGT up/ down 
Quartile AGT Quartile of Angiotensinogen level 
Q3AGTPost The last quartile of Angiotensinogen 
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AGT217_1 
AGT20 
AGT6 
REN5312_1 
SBP_ABPM1 
SBP_ABPM 2 
Undercontrol 
Alteration_in_24 
DBP_ABPMI1 
DBP_ABPM2 
DM 


Polymorfisme of AGT-217 
Polymorfisme of AGT-20 
Polymorfisme of AGT-6 
Polymorfisme of renin-53 12 
Blood pressure systolic before intervension (drug) 
Blood pressure systolic after intervension (drug) 
Under control in normal constraint before 
intervension 
Alteration of blood pressure in 24 hours (before/after 
intervension) 

Blood pressure diastolic before intervension 
Blood pressure diastolic after intervension 
Diabetes Militus 








Figure 3. Support vector machine 
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Figure 4. Flowchart of prediction using sequential support vector machine 
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The interface of prediction system is shown at Figure 5. There are some input parameter for training 
and testing data including lambda, gamma, C, epsilon and the number of iteration. 
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Figure 5. Implementation interface 


Then, based on the experiment result that the accuracy rate is achieved at 90%. It is applied at the 
best parameter value as in Table 3. 


Table 3. The Optimal Parameter Value for Testing Using SVM Algorithm 





Parameter Value 
lambda 0.9 
sigma 2 

Cc 0.1 
Epsilon 0.001 
Number of iteration 10 





The accuration rate of testing result is effect to many factors including preprocessing data of missing 
value using KNNI algorithm which have accuracy rate of 87%. Therefore, the next process for prediction is 
achieved of 90%. 


5. CONCLUSSION 

Identification system of drug therapy response for hepatitis patients has been applied using a 
combination of K-Nearest Neighbor (K-NN) Imputation and Support Vector Machine (SVM) algorithm. 
The amount missing value data have been covered using K-NN Imputation based on the similarity measure 
of the attribute value. It was applied at the initial stage before implemented SVM algorithm for prediction. 
Overall, the accuracy result is achieved of 90%. 
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